# Mixture of Experts architecture
Qwen3 0.6B GGUF
Apache-2.0
Qwen3 is the latest version of the Tongyi Qianwen series of large language models, offering a range of dense and Mixture of Experts (MoE) models. Based on large-scale training, Qwen3 has achieved breakthrough progress in reasoning capabilities, instruction following, agent functionalities, and multilingual support.
Large Language Model English
Q
prithivMLmods
290
1
Qwen3 128k 30B A3B NEO MAX Imatrix Gguf
Apache-2.0
GGUF quantized version based on Qwen3-30B-A3B Mixture of Experts model, extended to 128k context, optimized with NEO Imatrix quantization technology, supporting multilingual and multitask processing.
Large Language Model Supports Multiple Languages
Q
DavidAU
17.20k
10
Granite 4.0 Tiny Base Preview
Apache-2.0
Granite-4.0-Tiny-Base-Preview is a 7-billion parameter Mixture of Experts (MoE) language model developed by IBM, featuring a 128k token context window and enhanced expressive capabilities through Mamba-2 technology.
Large Language Model
Transformers

G
ibm-granite
156
12
Qwen3 30B A3B GGUF
Apache-2.0
Qwen3 is the latest large language model series developed by Alibaba Cloud, supporting dynamic switching between thinking mode and non-thinking mode, and excelling in reasoning, multilingual support, and intelligent agent capabilities.
Large Language Model English
Q
unsloth
261.09k
169
Qwen3 0.6B Base
Apache-2.0
Qwen3-0.6B-Base is the latest generation of large language models in the Tongyi Qianwen series, offering a range of dense models and Mixture of Experts (MoE) models.
Large Language Model
Transformers

Q
unsloth
10.84k
2
Qwen3 30B A3B GGUF
Apache-2.0
A large language model developed by Qwen, supporting a context length of 131,072 tokens, excelling in creative writing, role-playing, and multi-turn conversations.
Large Language Model
Q
lmstudio-community
77.06k
21
Qwen3 235B A22B GGUF
Apache-2.0
Quantized version of the 235 billion parameter large language model released by the Qwen team, supporting 131k context length and Mixture of Experts architecture
Large Language Model
Q
lmstudio-community
22.88k
10
Timemoe 50M
Apache-2.0
TimeMoE is a billion-scale time series foundation model based on the Mixture of Experts (MoE) architecture, focusing on time series forecasting tasks.
Materials Science
T
Maple728
22.02k
13
Tanuki 8x8B Dpo V1.0
Apache-2.0
Tanuki-8x8B is a large-scale language model pretrained from scratch, optimized for dialogue tasks through SFT and DPO
Large Language Model
Transformers Supports Multiple Languages

T
weblab-GENIAC
217
38
Norwai Mixtral 8x7B Instruct
A large Norwegian language model fine-tuned on instructions based on NorwAI-Mixtral-8x7B, optimized using approximately 9000 high-quality Norwegian instructions.
Large Language Model
Transformers

N
NorwAI
144
2
Qwen2
Other
The large language model of the Tongyi Qianwen Qwen2 series, which includes models with multiple parameter scales, ranging from 500 million to 72 billion parameters, and supports instruction tuning.
Large Language Model
Q
cortexso
132
1
Hkcode Solar Youtube Merged
MIT
A Korean language model further pretrained on SOLAR-10.7B, developed by the Fintech Department of Korea Polytechnics
Large Language Model
Transformers Korean

H
hyokwan
3,638
1
Karakuri Lm 8x7b Chat V0.1
Apache-2.0
A Mixture of Experts (MoE) model developed by KARAKURI, supporting English and Japanese dialogue, fine-tuned based on Swallow-MX-8x7b-NVE-v0.1
Large Language Model
Transformers Supports Multiple Languages

K
karakuri-ai
526
23
Jambatypus V0.1
Apache-2.0
A large language model fine-tuned with QLoRA on the Open-Platypus-Chat dataset based on Jamba-v0.1, supporting conversational tasks
Large Language Model
Transformers English

J
mlabonne
21
39
MGM 7B
MGM-7B is an open-source multimodal chatbot trained on Vicuna-7B-v1.5, supporting high-definition image understanding, reasoning, and generation.
Text-to-Image
Transformers

M
YanweiLi
975
8
Mixtral Chat 7b
MIT
This is a hybrid model created by merging multiple Mistral-7B variant models using the mergekit tool, focusing on text generation tasks.
Large Language Model English
M
LeroyDyer
76
2
Swallow MX 8x7b NVE V0.1
Apache-2.0
Swallow-MX-8x7b-NVE-v0.1 is a Mixture of Experts model based on Mixtral-8x7B-Instruct-v0.1 with continued pretraining, primarily enhancing Japanese capabilities.
Large Language Model
Transformers Supports Multiple Languages

S
tokyotech-llm
1,293
29
Mixtral 8x7B Holodeck V1 GGUF
Apache-2.0
A GGUF format model fine-tuned based on Mixtral 8x7B, specifically designed for Koboldcpp, with training data including approximately 3000 multi-genre e-books
Large Language Model English
M
KoboldAI
376
15
Orthogonal 2x7B V2 Base
orthogonal-2x7B-v2-base is a Mixture of Experts model based on Mistral-7B-Instruct-v0.2 and SanjiWatsuki/Kunoichi-DPO-v2-7B, specializing in text generation tasks.
Large Language Model
Transformers

O
LoSboccacc
80
1
Air Striker Mixtral 8x7B Instruct ZLoss 3.75bpw H6 Exl2
Apache-2.0
An experimental model fine-tuned based on Mixtral-8x7B-v0.1 with merged capabilities, supporting 8K context length and using ChatML prompt format
Large Language Model
Transformers English

A
LoneStriker
49
9
Sauerkrautlm Mixtral 8x7B GGUF
Apache-2.0
SauerkrautLM Mixtral 8X7B is a multilingual text generation model based on the Mixtral architecture. It has been fine-tuned and aligned using SFT and DPO, and supports English, German, French, Italian, and Spanish.
Large Language Model
Transformers Supports Multiple Languages

S
TheBloke
403
8
Nllb Moe 54b 4bit
NLLB-MoE is a Mixture of Experts machine translation model developed by Meta, supporting 200 languages, and is one of the most advanced open-access machine translation models available.
Machine Translation
Transformers Supports Multiple Languages

N
KnutJaegersberg
17
5
Switch C 2048
Apache-2.0
A Mixture of Experts (MoE) model trained on masked language modeling tasks, with a parameter scale of 1.6 trillion. It uses an architecture similar to T5 but replaces the feed - forward layer with a sparse MLP layer.
Large Language Model
Transformers English

S
google
73
290
Featured Recommended AI Models